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Sir: 

Prior to calculation of the filing fees and initial examination of the application, 
please amend the above-identified application as follows: 
IN THE CLAIMS : 

Please amend claims 3, 4, 6, 7, 10 & 11 as follows: 

3. (Amended) The method as claimed in claim 1, characterized in that 
furthennore the value of the bias of the receiving neuron is adapted in step a3). 

4. (Amended) A method for training a neural network in accordance with 
the preamble of claim 1 and if desired with the characterizing parts of claim 1, 
characterized in that the training of the neural network comprises a structure 
simplification procedure, that is to say the location and elimination of synapses that 
have no significant influence on the curve of the risk function in that 

b1 ) one selects a synapse, 



b)2 one assumes that said synapse does not have a significant influence on the 

curve of the risk function, 

b)3 one interrupts said synapse, 

b4) one compares the reaction of the neural network changed in accordance with 
step b3) with the reaction of the unchanged neural network, and 
b5) if the variation of the reaction does not exceed a predetermined level, one 
decided to keep the change made in step b3). 

6. , (Amended) The method as claimed In claim 1 , characterized in that the value of 
a likelihood function is calculated for the neural network to represent the reaction of the 
neural network. 

7. (Amended) The method as claimed in claim 1, characterized in that the 
structure variants of the neural network are compared using a significance test. 

10. (Amended) The method as claimed in claim 1, characterized in that, to 
compare two structure variants of the neural network, the ratio of the values of the 
likelihood functions for said two structure variants is calculated. 

11. (Amended) A method for training a neural network in accordance with the 
preamble of claim 1 and if desired with the characterizing parts of claim 1 , characterized 
in that the training of the neural network comprises an optimization procedure in which 
the strengths of the individual synapses, that is to say the strengths of the connections 
between the neurons, are optimized, and in that the simplex method which is known per 
se is used for said optimization. 



REMARKS 

Claims 1-11 are pending in this application. By this Amendment, claims 3, 4, 6, 
6, 10 & 11 are amended to correct the multiple dependency thereof and to place this 
application into better condition for examination. No new matter is added. 



Respectfully submitted, 
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a5) if the variation of the reaction does not exceed a 
predetermined level, one decides to keep the change 
made in step a3) . 

2. The method as claimed in claim 1, characterized 
in that the two sending neurons are located on one and 
the same layer. 

3 . The method as claimed in claim 1 |or 2}, 
characterized in that furthermore the value of the bias 
of the receiving neuron is adapted in step a3) . 

4 . A method for training a neural network in 
accordance with the preamble of claim 1 and if desired 
with the characterizing parts of jany'^'of^ claims 1 to ^, 
characterized in that the training of the neural network 
comprises a structure simplification procedure, that is 
to say the location and elimination of synapses that have 
no significant influence on the curve of the risk 
function, in that 

bl) one selects a synapse, 

b2) one assumes that said synapse does not have a 
significant influence on the curve of the risk 
function, 

b3) one interrupts said synapse, 

b4) one compares the reaction of the neural network 
changed in accordance with step b3) with the 
reaction of the unchanged neural network, and 

b5) if the variation of the reaction does not exceed a 
predetermined level , one decides to keep the change 
made in step b3) . 

5. The method as claimed in claim 4, characterized 
in that, when in the course of the structure 
simplification procedure n-1 synapses have already been 
eliminated and the strength of the influence of an nth 
synapse is being tested, the reaction of the neural 
network reduced by n synapses is not only compared with 
the reaction of a network reduced by only n-1 synapses, 
but also with the reaction of the neural network with its 
complete structure as present at the beginning of said 
structure simplification procedure, and in that the 
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elimination of the nth synapse is only retained if the 
deviation of the reaction does not exceed a predetermined 
level for both comparisons. ^ ^ 

6- The method as claimed in ^ny^'of claims 1 to 5), 

characterized in that the value of a likelihood function 
is calculated for the neural network to represent the 
reaction of the neural network, i 

7. The method as claimed in (any of claims 1 to 6/, 
characterized in that the structure variants of the 
neural network are compared using a significance test. 

8. The method as claimed in claim 7, characterized 
in that the structure variants of the neural network are 
compared using the CHI -SQUARED test which is known per 
se . 

9. The method as claimed in claim 7, characterized 
in that the structure variants of the neural network are 
compared using the BOOT- STRAPPING method which is known 
per se. 

10. The method as claimed in |any o^"^ claims 1 to s], 
characterized in that, to compare two structure variants 
of the neural network, the ratio of the values of the 
likelihood functions for said two structure variants is 
calculated. 

11 • A method for training a neural network in 

accordance with the preamble of claim 1 and if desired 
with the characterizing parts offany ot^'^cTaims 1 to 10), 
characterized in that the training of the neural network 
comprises an optimization procedure in which the 
strengths of the individual synapses, that is to say the 
strengths of the connections between the neurons, are 
optimized, and in that the simplex method which is known 
per se is used for said optimization. 
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Method for training a neural network. 



Description 



5 



1. 



FIELD OF THE INVENTION 



The invention relates to a method for training a 



neural network to determine risk functions for patients 
10 following a first occurrence of a predetermined disease 
on the basis of given training data records containing 
obj ectif iable and for the most part metrologically 
captured data relating to the medical condition of the 
patient, wherein the neural network comprises an input 
15 layer having a plurality of input neurons and at least 
one intermediate layer having a plurality of intermediate 
neurons, as well as an output layer having a plurality of 
output neurons, and a multiplicity of synapses which 
interconnect two neurons of different layers in each 
20 case. 

2. TECHNICAL BACKGROUND - PRIOR ART 

2 » 1 . General 

25 For large scale data analysis, neural networks 

have supplemented or replaced hitherto conventional 
methods of analysis in many fields. It has namely been 
shown that neural networks are better than conventional 
methods at discovering and identifying in the datasets 

3 0 hidden, not immediately evident dependencies between 
individual input data. When new data of the same data 
type is input, neural networks which have been trained 
using a known dataset therefore deliver more reliable 
results than previous methods of analysis . 

3 5 In the field of medical applications for example, 

the use of neural networks to determine a sur^rival 
function for patients suffering from a particular 
disease, such as cancer, is known. Said survival fiinction 
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indicates the probability of a predetermined event 
occurring for the patient in question depending on the 
time that has elapsed since the first occurrence of the 
disease. Said predetermined event need not necessarily be 
the death of the patient, as would be inferred from the 
designation "survival function", but may be any event, 
for example a recurrence of cancer. 

The data records comprise a whole range of 
objectif iable information, that is to say data on whose 
value any neural network operator has no influence and 
whose value can be automatically captured if desired. In 
the case of breast cancer this is information about the 
patient's personal data, such as age, sex and the like, 
information about the medical condition, such as number 
of lymph nodes affected by cancer, biological tumor 
factors such as upA (Urokinase Plasminogen Activator) , 
its inhibitor PAI-1 and similar factors, as well as 
information about the treatment method, for example type, 
duration and intensity of chemotherapy or radiotherapy. 
It goes without saying that a whole range of the 
abovementioned information, in particular the information 
about the medical condition, can only be determined using 
suitable measuring apparatus. Furthermore, the personal 
data can be automatically read in from suitable data 
media, for example machine-readable identity cards or the 
like. If they are not all available at the same time, 
which is often the case especially with laboratory 
measurements, the obj ectif iable data can of course be 
temporarily stored in a database on a suitable storage 
medium before they are fed to the neural network as input 
data . 

2.2. The neural network as signal filter 

In accordance with the foregoing, therefore, it 
is possible to conceive of a neural network as a kind of 
"signal filter" that filters out a meaningful output 
signal from a noisy, and therefore as yet non-meaningful 
input signal. As with any filter, whether or how well the 
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filter is able to fulfill its function depends on whether 
it is possible to keep the intensity of the filter's 
intrinsic noise low enough that the signal to be filtered 
out is not lost in this intrinsic noise. 
5 The greater the number of data records available 

for training the neural network on the one hand and the 
simpler the structure of the neural network on the other 
hand, the lower the intensity of the "intrinsic noise" of 
a neural network. Moreover, the general iz ability of the 

10 network increases, the simpler the structure of the 
neural network. In the case of a conventional procedure 
in the prior art, therefore, one part of the training of 
neural networks is concerned with locating and 
eliminating parts of the structure that can be dispensed 

15 with for obtaining a meaningful output signal, with this 
"thinning out" (also known as "pruning" in the jargon) 
however, a further constraint to be taken into account is 
that the structure of the neural network cannot be 
"pruned" ad infinitum because as the complexity of the 

20 neural network is reduced, its ability to map complex 
interrelationships, and hence its meaningfulness , is also 
diminished. 

2.3. Problems with medical application 

25 In practice, and in particular in the case of the 

medical application of neural networks mentioned at the 
beginning, the problem is often encountered that only 
very small datasets of typically a few hundred data 
records are available for training the neural network. To 

30 compoxind the difficulty, not only a training dataset, but 
also a validation dataset and a generalization dataset 
must be provided for the training. The significance of 
said two datasets will be discussed in greater detail 
below in sections 5.5 and 5.7. 

35 With such small datasets, the use ' of known 

pruning methods always led to so great a simplification 
of the structure of the neural network that the 
meaningfulness of the neural network diminished to an 
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unacceptable level. To nevertheless obtain neural 
networks that delivered meaningful output signals after 
completion of the training phase, in the prior art neural 
networks with a rigid, that is to say fixed and 
invariable, structure were used where only small training 
datasets were available. The degree of complexity, or the 
simplicity, of this rigid structure was selected here on 
the basis of empirical knowledge in such a way that the 
neural network had on the one hand a high degree of 
meaningfulness while on the other hand having a still 
acceptable intrinsic noise level. It has hitherto been 
assumed that the specification of an invariable structure 
was unavoidable. 

Another problem with medical applications of 
neural networks is the fact that only "censored" data are 
available for training. The term "censored" is used to 
denote the circumstance that it is not possible to 
foresee the future development for patients who have 
fortunately not yet suffered a relapse at the time of 
data capture, and statements about the survival function 
are therefore only possible up until the time the data 
were recorded. 

It goes without saying that in particular in the 
case of medical applications it is not possible to forego 
a truly meaningful result under any circumstances 
whatsoever. Under no circumstances is it namely 
acceptable for even one single patient to be denied a 
treatment simply because the neural network did not 
consider it necessary. The consequences for the patient 
could be incalculable. 

With respect to the details of the prior art 
outlined above, please see the articles listed in section 
6. "References". 

3. OBJECT OF THE INVENTION 



In the light of the above, the object of the 
invention is to provide an automatic method for training 
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a neural network to determine risk functions for patients 
following a first occurrence of a predetermined disease, 
which method permits, despite a low number of available 
training data records, the use of a neural network having 
5 a variable structure and the optimization of its 
structure in at least one structure simplification step. 

4. ACHIEVEMENT OF THE OBJECT 

10 According to the invention, this object is 

achieved by a method for training a neural network to 
determine risk functions for patients following a first 
occurrence of a predetermined disease on the basis of 
given training data records containing obj ectif iable and 

15 metrologically captured data relating to the medical 
condition of the patient, wherein the neural network 
comprises : 

an input layer having a plurality of input neurons, 
at least one intermediate layer having a plurality 
20 of intermediate neurons, 

an output layer having a plurality of output 
neurons , and 

a multiplicity of synapses which interconnect two 
neurons of different layers in each case, 

2 5 wherein the training of the neural network comprises a 

structure simplification procedure, that is to say the 
location and elimination of synapses that have no 
significant influence on the curve of the risk function, 
in that one either 

3 0 al) selects two sending neurons that are connected to 

one and the same receiving neuron, 
a2) assumes that the signals output from said sending 
neurons to the receiving neuron essentially exhibit 
the same qualitative behavior, that is to say are 
3 5 correlated to one another, 

a3) interrupts the synapse of one of the two sending 
neurons to the receiving neuron and instead adapts 
accordingly the weight of the synapse of the 
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respective other sending neuron to the receiving 
neuron, 

a4) compares the reaction of the neural network changed 

in accordance with step a3 ) with the reaction of the 

unchanged neural network, and 
a5) if the variation of the reaction does not exceed a 

predetermined level, decides to keep the change made 

in step a3 ) , 
or in that one 
bl) selects a synapse, 

b2) assumes that said synapse does not have a 
significant influence on the curve of the risk 
function, 

b3) interrupts said synapse, 

b4) compares the reaction of the neural network changed 

in accordance with step b3 ) with the reaction of the 

unchanged neural network, and 
b5) if the variation of the reaction does not exceed a 

predetermined level, decides to keep the change made 

in step b3 ) . 

A neural network trained in the manner described 
above assists the attending physician for example when 
deciding on the follow-up treatment for a particular 
newly operated patient. For this the physician can input 
into the neural network the patient data and the data 
metrologically captured in the laboratory relating to the 
medical condition of the first treatment, and receives 
from the neural network information about what type of 
follow-up treatment would produce the most favorable 
survival function for the patient in question. It is of 
course also possible to take account of the 
aggressiveness of the individual types of follow-up 
treatment so that, given an equally favorable or 
virtually equally favorable survival function, the least 
aggressive follow-up treatment for the patient can be 
selected. 
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5. EXEMPLARY EMBODIMENT 

The invention is explained in greater detail 
below with reference to an exemplary embodiment. 

5 

5.1. Structure of neural networks 

Fig. 1 shows the structure of a neural network 
which is constructed in the manner of a multi-layer 
perceptron. In this case the neural network comprises: 
10 - an input layer having a plurality of input neurons 
Ni (i for "input neuron"), 

at least one intermediate layer having a plurality 
of intermediate neurons (h for "hidden neuron"), 
an output layer having a plurality of output neurons 
15 No (o for "output neuron"), and 

a multiplicity of synapses which interconnect two 
neurons of different layers in each case. 

In the simplified embodiment according to Fig. 1, 
on which the following discussion will be based for the 

2 0 sake of clarity, only a single intermediate layer is 

provided, and the neurons (or nodes as they are also 
frequently called) of the output layer are connected via 
synapses (also called "connectors") to both each neuron 
of the input layer and to each neuron of the intermediate 
25 layer. 

The number of input neurons is usually chosen 
depending on the number of obj ectif iable items of 
information available. However, if the time required for 
determining the reaction of the neural network should 

3 0 consequently rise to an unacceptable level, then it is 

possible, for example with the aid of neural networks 
having a greatly simplified structure, to make a 
preliminary estimation of the significance of the 
individual obj ectif iable items of information for the 
3 5 meaningfulness of the overall system. It should however 
be stressed that this preliminary estimate is also 
performed automatically and without the intervention of 
the respective operator. Furtherraore , the number of 



wo 01/15708 



- 8 - 



PCT/EPOO/08280 



output neurons is chosen to be large enough that, for the 
purposes of a series expansion of the survival function, 
a sufficient number of series expansion terms are 
available to achieve a meaningful approximation to the 
5 actual survival function. Finally, the number of 
intermediate neurons is chosen to be large enough that 
the results of the trained neural network are meaningful, 
but small enough that the timfe required to determine the 
result is acceptable. 

10 

5.2. Function of neural networks 

5.2.1. General 

Each neuron receives a stimulation signal S, 

15 processes it in accordance with a predetermined 
activation function F(S) and outputs a corresponding 
response signal A = F(S) which is fed to all neurons 
located below said neuron. The stimulation signal Sy that 
acts on the neuron Ny in question is usually formed by 

2 0 summing the response signals Ax of the neurons Nx located 
above said neuron Ny, with the contributions of the 
individual neurons Nx in each case being factored with a 
weighting factor Wxy that states the strength of the 
synapse connecting the two neurons into the sum. 

25 

Stimulation signal: Sy = Dx Wxy-Ax 

Response signal: Ay = F(Sy) 

30 5.2.2. Input layer 

The stimulation signals Si of the input neurons Ni 
are formed by the input data Xi,j relating to a particular 
patient j . 

35 Stimulation signal: Si = Xi,j 

In order to be able to interpret the weights of 
the synapses of a neural network appropriately, it is 
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preferable to work with variables whose values are of the 
magnitude of 1 . To achieve this despite the usually very 
different distributions of input data, it is customaiy to 
subject the input data to an appropriate transformation. 
5 Said transformation is performed by the activation 
function Fi of the input neurons: 

Response signal: Ai = tanh[(Si - Si, mean) /Si, q] 

10 For the input data Xi,j, therefore, firstly the 

mean value Si, mean of the patients j belonging to the 
training dataset is formed. Secondly a scaling factor Si,Q 
is formed. If the value of an input variable Xi,j is above 
the mean Si,nieaii, then scaling is performed in accordance 

15 with the 75% quart ile. If, on the contrary, it is below 
the mean value, then scaling is performed in accordance 
with the 25% quartile. Finally, by using the hyperbolic 
tangent function as the activation function Fi, scaled 
response signals with values in the range between -1 and 

20 +1 are readily obtained. 

Note that the above transformation can be omitted 
for input data that already exhibit the desired 
distribution, categorical values or binary values. 

25 5.2.3. Intermediate layer 

The stimulation signal Sh for the neurons Nh of the 
intermediate layer is formed by the weighted sum of the 
response signals Ai of all neurons Ni of the input layer: 

30 Stimulation signal: Sh = Di Wih-Ai 

Said stimulation signal Sh is transformed by the 
neurons Nh in accordance with a given activation function 
Fh, which may again be the hyperbolic tangent function for 
35 example, into a response signal Ah: 

Response signal: Ah = FhCSh - bh) 
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In the field of neural networks, the parameters 
bh are referred to as the "bias" of the respective neuron. 
Like the values of the synapse weights Wxy, the values of 
said bias parameters bh are also determined during 
training of the neural network. 

5.2.4. Output layer 

The stimulation signal So and the response signal 
Ao for a neuron No of the output layer are determined 
analogously: 

Stimulation signal: So = Si wiq- (Ai - Ci)+ Who'Ah 
Response signal: Ao = Fo(So - bo) 

The parameters bo again indicate the "bias" of the 
neurons No of the output layer, while the parameters Ci 
serve to adapt the stimulation contributions of the 
neurons Ni of the input layer and Nh of the intermediate 
layer. The values of both the parameters bo and the 
parameters ci are determined during the training phase of 
the neural network. With respect to the bias values bo, it 
may be favorable to require as a constraint that the 
response of all output neurons No averaged across the 
complete training dataset is zero. The identity function 
Fo(x) = X can be used as the activation function Fq for 
most applications, in particular for the present case 
where the survival function is being determined for 
cancer patients . 

The response signals Ao of the output neurons No 
indicate the respective coefficients of the associated 
terms of the series expansion of the suirvival function 
sought . 

5.3. The survival function 

As already mentioned above, the input data 
comprise information about the patient ' s personal data as 
well as information about the medical condition. All 
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these data are captured at a time t = 0, in the case of 
cancer patients the time of the first operation for 
example. Following the first operation, the patients then 
undergo a particular follow-up treatment, which may 
5 include chemotherapy and/or radiotherapy for example. 

The survival function S(t) indicates for a 
patient in question at a time t the probability that a 
particular event has not yet occurred. Said particular 
event may be, for example, a recurrence of cancer, or 

10 also in the worst case the death of the patient. In any 
case, S(0) = 1 holds for the survival function. In 
addition, S('=°) = 1 is usually assumed. 

According to conventional notation, it is 
possible to define an event density f(t) and a risk 

15 function A(t) on the basis of the survival function S(t): 



f(t) = -dS/dt 



A(t) = f(t)/S(t) 
from which it follows that: 



A(t) = -(d/dt) [In S(t) ] 



25 If one knows the curve of the risk function A(t) 

therefore, it is possible to reconstruct the curve of the 
survival function S(t) by means of integration. 

The task of the neural network is to model the 
curve of the risk function X(t) in the same way as a 

30 series expansion: 



A(t) = Ao-expEXo Bo(t)-Ao] 



According to the above notation, the parameters 
35 Ao denote the response signals of the neurons No of the 
output layer of the neural network. In the context of the 
present invention, Ao is a parameter independent of t 
which is used as a scaling factor. Bo(t) denotes a set of 
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functions that, as base functions of the series 
expansion, enable a good approximation to the actual 
curve of the risk function. It is possible to use for 
example the fractal polynomials or else functions such as 
5 t^ (where p is not necessarily an integer) as the function 
set Bo(t) . Boi{t) = 1; Bo2(t) = const-t^^^, ... were used for 
the present invention. 

5.4. Training of the neural network - preparations 

10 

5.4.1. The optimization function 

The training dataset comprises the data records 
of a plurality of patients for whom not only personal 
data and information about the medical condition, but 

15 also information about the type of follow-up treatment 
and the further progress of the disease are known. From 
the collected data relating to the further progress of 
the disease, an "actual survival function" is constructed 
according to the following rules: if the predetermined 

20 event, for example a relapse or the death of the patient, 
has already occurred for a particular patient at a time 
t, then his contribution 5 to the "actual survival 
function" is set to 5 = 0 before time t and to 5 = 1 at 
time t and after time t. Patients for whom the 

25 predetermined event has not yet occurred at the time the 
training dataset was created ("censored" data) contribute 
only 5 = 0 to the "actual survival function" at all 
times. During the training phase the weights w^y of the 
synapses and the other optimization parameters set out in 

3 0 section 5.2. above are then set in such a way that the 
survival function delivered by the neural network 
optimally matches the "actual survival function". 

This can be achieved, for example, by defining a 
suitable optimization function O for this purpose and 

3 5 searching for a local, in the most favorable case even 
the global, minimum of said optimization function in the 
space covered by the optimization parameters. To define 
the optimization function 0, it is already known in the 
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prior art to start from a so-called like.lihood function 
L: 

O = -In L 

5 

According to the invention 
L = Hj [fj{t)]^ • [Sj(t)]^-= 

10 is chosen to represent the likelihood function where, in 
accordance with the notation introduced in section 5.3., 
f j (t) and Sj (t) denote the event density and the survival 
function for the patient j of the training set. Said 
likelihood function has the advantage that the 

15 computational effort rises only approximately 
proportionately to the number of patients included in the 
training dataset . 

Another way of representing the likelihood 
function is: 

20 

L = rij < expfDo Bo(t) -Aoj] 
n Di expCDi Bo(t) -Aoi] > 

where the product is formed across all patients j for 
25 whom the predetermined event has already occurred at time 
t, and where the first sum in the denominator of the 
quotient is formed across all patients 1 for whom the 
predetermined event has not yet occurred at time t . 

The computational effort associated with this 
3 0 representation does however rise approximately 
proportionately to the square of the number of patients 
included in the training dataset. 

5.4.2. The initialization 
35 As is known per se in the prior art, to 

initialize the network optimization parameters, for 
example the weights of the synapses connecting the 
neurons, it is possible to assign stochastically to said 
parameters small values that conform to certain 
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normalization rules. It is additionally possible here to 
include in the noinnalization findings obtained in 
prelirainairy test runs on neural networks having a greatly 
simplified structure. 

5 

5.5. Training the neural network - simplex method 

As is customary, the search for a local or the 
global minimum of the optimization function is performed 
in several steps or cycles. According to the invention, 

10 however, for the first time the simplex method proposed 
by Nelder and Mead (see section 6. "References") is used 
in a neural network for this search. A simplex is an 
(n+1) -dimensional structure in an n-dimensional space 
which surrounds the current basepoint in the 

15 n-dimensional space, i.e. a triangle in 2 -dimensional 
space, a tetrahedron in a 3 -dimensional space and so 
forth. In what directions and at what distances from the 
current basepoint the (n+1) vertices are arranged is 
determined here from the vertices of the preceding cycle 

2 0 on the basis of the characteristics of the optimization 
function . 

This method leads to a strictly monotonic 
decreasing sequence of basepoints . It can be continued 
until either (within given precision limits) a local or 

2 5 global minimum has been identified or another termination 

criterion has been fulfilled. In connection with said 
further termination criterion, the abovementioned 
validation dataset now comes into play: 

The abovementioned monotonic decrease in 

3 0 basepoints can arise on the one hand from actually 

objectif iable characteristics of the optimization 
function specified for the training dataset. On the other 
hand, it is also possible that the decrease occurs in the 
range of a valley of the optimization function caused by 
35 stochastic fluctuations. The latter effect however only 
simulates a learning success. For this reason, according 
to the invention the characteristics of the optimization 
function specified on the basis of the validation dataset 
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are also investigated at the same basepoints. If it is 
then determined that the basepoints of the "validation 
data optimization function" also exhibit a monotonia 
decrease, then it can be assiimed that one is still in a 
"true" learning phase of the neural network. If on the 
other hand the sequence of basepoints of the "validation 
data optimization function" stagnates, or if it even 
rises again, it must be assumed that with respect to the 
"training data optimization function" one is in a valley- 
caused by stochastic fluctuations which only simulates a 
learning progress. The cyclical execution of the simplex 
method can therefore be interrupted. 

The main advantage of the simplex method is that 
it can be performed solely on the basis of ' the 
optimization function, and also that the step length and 
step direction can be automatically specified. 

5.6. Training the neural network - structure simplific- 
ation ( "pruning" ) 

Once the search for a local or the global minimum 
has been completed, the next training step is to 
investigate whether it is possible to simplify the 
structure of the neural network on the basis of the 
findings so far. This "pruning" is concerned with 
investigating which of the synapses have so little 
influence on the overall function of the neural network 
that they can be omitted. In the simplest case this can 
be, for example, permanently setting the weight assigned 
to them to zero. However, in principle it is eq[ually 
conceivable to "freeze" the weight of the respective 
synapse to a fixed value. It is advantageous to alternate 
between simplex optimization steps and structure 
simplification steps in an iterative process. It would of 
course be desirable for the neural network to undergo a 
new simplex optimization after a single synapse has been 
excluded. In view of the total time required for training 
however, this is unjustifiable. In practice a favorable 
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compromise has proved to be the removal during a 
structure simplification step of at most 10% of the 
synapses still present at the beginning of said step. 

According to the invention the two methods 
5 described below in sections 5.6.1. and 5.6.2. are used 
for structure simplification. 

5.6.1. Likelihood method 

With this method the value of the likelihood 

10 function is first calculated as a reference value on the 
basis of the complete structure of the neural network in 
its present state of training, i.e. using the current 
values of the weights of all synapses. Following this, 
the influence of a given synapse is suppressed, i.e. the 

15 value of the weight of this synapse is set to zero. The 
value of the likelihood function is then calculated for 
the thus simplified network structure, and the ratio of 
this value to the reference value is formed. 

Once said likelihood ratio has been calculated 

20 for all synapses, when performing the steps described 
below, a start is made with the synapse for which the 
value of the likelihood ratio is nearest to one: 

Assuming that the network structure has already 
been simplified by (x-1) synapses and the significance of 

2 5 the xth synapse is now being investigated, then the 

following three network structure variants are compared: 
firstly the complete structure of the neural network in 
its current state of training with all synapses still 
present prior to this structure simplification state, 

3 0 secondly the network structure excluding the (x-1) 

synapses already suppressed in this structure 
simplification step, and thirdly the network structure 
now also excluding the xth synapse. Following this, using 
a significance test the third structure variant is 
3 5 compared firstly with the first structure variant 
(complete structure) and secondly with the second 
structure variant ((x-1) synapses suppressed). If even 
just one of the two tests produces too great a deviation 
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from the third structure variant, then . the respective 
synapse is retained at least for the next simplex 
optimization step. 

The CHI-SQUARED test (cf. section 6. 

5 "References", Document ) which is known per se can 

be used as a significance test for example. 
Alternatively, said significance test could also be 
performed using the BOOT -STRAPPING method (cf . section 6. 
"References", Document ) which is likewise known per 

10 se. The use of the CHI -SQUARED test is particularly 
favorable if the reaction of the neural network is 
determined on the basis of a likelihood function. The 
BOOT- STRAPPING method is also suitable with other types 
of functions for representing the reaction of the neural 

15 network. 

5.6.2. Correlation method 

The exclusion or suppression of synapses 
according to the correlation method is based on the 

20 consideration that it could be possible for two neurons 
located on one and the same layer to have qualitatively 
the same influence on one neuron on a lower layer. In 
this case the reaction of the neural network, or to be 
more precise the response signal of said latter neuron, 

25 should not change significantly if said neuron is 
stimulated by only one of the two neurons located above 
it, and the influence of the second neuron is taken into 
account by strengthening the remaining synapse. It would 
then be possible to omit the synapse leading from the 

3 0 second neuron to the neuron in question. 

a. Synapses connecting input neurons and output neurons 
In accordance with section 5.2.4., the 
contribution of the response signal of two input neurons 
35 to the stimulation signal of an output neuron takes the 
form: 



So = Wio- {Ai - Ci) +'w2o- (A2 - C2) 
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If one then assumes that the two response signals 
Ai and A2 are correlated at least approximately to one 
another in accordance with 

5 

A2 = m-Ai + n 

and that the weight Wio is greater than the weight W20/ 
then the following holds for the stimulation signal Sq: 

10 

So = (Wio + W2o-m) -Ai + (n-W2o - Wiq-Ci - W2o-C2) 

= w*io- (Ai - c*i) 

15 where 

w*io = wio + W2o-m 

and 

C*i = -[(n-W2o - Wio-Ci - W20-C2) ] / (Wio + W2o-m) 

20 

If w*io is non-small, the behavior of the neural 
network can be tested with the following assumptions : 

1. Replace the weight Wio by w*io; 
25 2. Replace the parameter Ci by c*i; and 

3 . Suppress the synapse from the input neuron N2 to the 
output neuron Nq. 

If the outcome of this test, which can again be 
3 0 performed as a CHI -SQUARED test for example, is positive, 
then it is possible to omit the synapse from the input 
neuron N2 to the output neuron Nq. 

b. Synapses connecting input neurons and intermediate 
35 neurons 

The contribution of the response signal of two 
input neurons to the stimulation signal of an 
intermediate neuron can also be treated analogously, in 
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which case it is advisable, for reasons that will become 
immediately apparent below, to treat the stimulation 
signal of the intermediate neuron including its "bias" : 

Sh - bh = Wih-Ai + W2h-A2 

If one again assumes that the two response 
signals Ai and A2 are correlated at least approximately to 
one another in accordance with 

A2 = m-Ai + n 

and that the weight Wih is greater than the weight W2h, 
then the following holds for the stimulation signal Sh: 

- bh = (wih + W2hTn) -Ai + n-W2h 

or 

Sh - b*h = w*ih-Ai 

where 

w*ih = Wih + W2h-m 



b h = bh + n-W2h 

If w*ih is non-small, the behavior of the neural 
network can be tested with the following assumptions: 

1 . Replace the weight Wih by w*ih; 

2. Replace the bias bh by b*h; and 

3 . Suppress the synapse from the input neuron N2 to the 
intermediate neuron Nh. 

If the outcome of this test, which can again be 
performed as a CHI -SQUARED test for example, is positive, 
then it is possible to omit the synapse from the input 
neuron N2 to the intermediate neuron Nh- 
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c. Synapses connecting intermediate neurons and output 
neurons 

Synapses leading from inteimiediate neurons to 
5 output neurons can also be treated analogously. With 
respect to the bias values bo, however, the further 
constraint mentioned in section 5.2.4. may need to be 
taken into account. 

10 5.6.3. Testing the topology 

The above-described pruning of the structure of 
the neural network can result in individual neurons no 
longer being connected to any other neurons. This is the 
case for example if an input neuron is not connected to 

15 any intermediate neuron nor to any output neuron, or if 
an output neuron is not connected to any intermediate 
neuron nor to any input neuron. It is therefore only 
logical to completely deactivate these neurons that no 
longer have an influence on the function of the neural 

20 network. 

Inte2rmediate neurons that are still connected to 
neurons on the input layer but not to neurons on the 
output layer constitute a special case. Said intermediate 
neurons can no longer exert any influence on the function 

2 5 of the neural network. The synapses leading from the 

input layer to these intermediate neurons can therefore 
also be suppressed, i.e. the weights of said synapses can 
be set to zero. 

The converse case can however also occur, namely 
30 that an intermediate neuron is still connected to the 
output layer, but no longer has any connection to the 
input layer. At best said intermediate neurons can output 
to the output neurons a response signal that is dependent 
on their "bias". However, a signal of this type has no 

3 5 information content whatsoever that would be significant 

for the function of the neural network. It is therefore 
also possible to suppress the remaining synapses of said 
intermediate neurons . 
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5.7. Generalization 

On completion of the training phase it is 
5 necessary to test the performance of the trained neural 
network to obtain a measure of how meaningful the 
survival functions delivered by this neural network 
actually are. The abovementioned generalization dataset, 
which had no influence whatsoever on the training of the 
10 neural network and thus enables objective results, is 
used for this purpose. 

5.8. Concluding remarks 

15 In conclusion it should be mentioned that, in 

y3 addition to the tumor- specif ic factors upA and PAI-1 

^t: explicitly mentioned above which allow statements to be 

111 

p made about invasion, it is also possible to take further 

s such factors into account. Among others, these include 

n 

20 factors for proliferation, for example the S phase and 
O Ki-67, and other processes that influence tiimor growth. 
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Claims 

1. A method for training a neural network to 

determine risk functions for patients following a first 
occurrence of a predetermined disease on the basis of 
given training data records containing objectif iable and 
metrologically captured data relating to the medical 
condition of the patient, wherein the neural network 
comprises : 

an input layer having a plurality of input neurons, 
at least one intermediate layer having a plurality 
of intermediate neurons, 

an output layer having a plurality of output 
neurons , and 

a multiplicity of synapses which interconnect two 
neurons of different layers in each case, 
characterized in that the training of the neural network 
comprises a structure simplification procedure, that is 
to say the location and elimination of synapses that have 
no significant influence on the curve of the risk 
function, in that 

al) one selects two sending neurons that are connected 
to one and the same receiving neuron, 

a2) one assumes that the signals output from said 
sending neurons to the receiving neuron essentially 
exhibit the same qualitative behavior, that is to 
say are correlated to one another, 

a3) one interrupts the synapse of one of the two sending 
neurons to the receiving neuron and instead adapts 
accordingly the weight of the synapse of the 
respective other sending neuron to the receiving 
neuron , 

a4) one compares the reaction of the neural network 
changed in accordance with step a3 ) with the 
reaction of the unchanged neural network, and 
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a5) if the variation of the reaction does not exceed a 
predetermined level, one decides to keep the change 
made in step a3) . 

2. The method as claimed in claim 1, characterized 
in that the two sending neurons are located on one and 
the same layer. 

3. The method as claimed in claim 1 or 2, 
characterized in that furthermore the value of the bias 
of the receiving neuron is adapted in step a3) . 

4. A method for training a neural network in 
accordance with the preamble of claim 1 and if desired 
with the characterizing parts of any of claims 1 to 3 , 
characterized in that the training of the neural network 
comprises a structure simplification procedure, that is 
to say the location and elimination of synapses that have 
no significant influence on the curve of the risk 
function, in that 

bl) one selects a synapse, 

b2) one assumes that said synapse does not have a 
significant influence on the curve of the risk 

function, 
b3) one interrupts said synapse, 

b4) one compares the reaction of the neural network 
changed in accordance with step b3) with the 
reaction of the unchanged neural network, and 

b5) if the variation of the reaction does not exceed a 
predetermined level, one decides to keep the change 
made in step b3) . 

5. The method as claimed in claim 4, characterized 
in that, when in the course of the structure 
simplification procedure n-1 synapses have already been 
eliminated and the strength of the influence of an nth 
synapse is being tested, the reaction of the neural 
network reduced by n synapses is not only compared with 
the reaction of a network reduced by only n-1 synapses, 
but also with the reaction of the neural network with its 
complete stmcture as present at the beginning of said 
structure simplification procedure, and in that the 
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elimination of the nth synapse is only retained if the 
deviation of the reaction does not exceed a predetermined 
level for both comparisons. 

6. The method as claimed in any of claims 1 to 5, 
5 characterized in that the value of a likelihood function 

is calculated for the neural network to represent the 
reaction of the neural network. 

7. The method as claimed in any of claims 1 to 6 , 
characterized in that the structure variants of the 

10 neural network are compared using a significance test. 

8. The method as claimed in claim 7, characterized 
in that the structure variants of the neural network are 
compared using the CHI -SQUARED test which is known per 
se . 

15 9. The method as claimed in claim 1, characterized 

in that the structure variants of the neural network are 
compared using the BOOT- STRAPPING method which is known 
per se . 

10. The method as claimed in any of claims 1 to 8 , 

2 0 characterized in that, to compare two structure variants 

of the neural network, the ratio of the values of the 
likelihood functions for said two structure variants is 
calculated. 

11 . A method for training a neural network in 
25 accordance with the preamble of claim 1 and if desired 

with the characterizing parts of any of claims 1 to 10, 
characterized in that the training of the neural network 
comprises an optimization procedure in which the 
strengths of the individual synapses, that is to say the 

3 0 strengths of the connections between the neurons, are 

optimized, and in that the simplex method which is known 
per se is used for said optimization. 
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(54) Title: METHOD FOR TRAINING A NEURAL NETWORK 
: (54) Bezeichnung: VERFAHREN ZUM TRAINIEREN EINES NEURONALEN NEI2ES 





(57) Abstract: "Ihe object of the inventive method is to train a neural network to determine risk functions in patients in association 
<S with a first illness relating to a predetermined illness based on the predetermined training data sets. Said predetermined data sets 

contain objectifiable infoimation which can be used as a yardstick to measure the pathological condition of the patient The neural 

network contains a plurality of neurons arranged in several layers, in addition to synqjses connecting said neurons. In the course of 
00 the training, the structure of the neural network is simplified by tracking and elinainating die synapses which play no significant role 
^ ui the evolution of the risk functions. This can be done, for example, by examining a possible correlation between the influences 

which two sending netnons have on the same receiving neuron, and where possible, eliminating one of the two synzipses connecting 

to the receiving netnron. 



O (57) ZusammenfassuDg: Das erfindungsgemaBe Verfahren dient zum Trainieren eines neuronalen Netzes zur Ermittlung von Ri- 
sikofunktionen fiir Patienten im AnschluS an eine Ersterkrankiing mit einer voiijestimmten Krankheit aiif Grundlage vorgegebener 
Trainings-Datensatze, welche objektivierbare und 
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